Parallel Computation of Echelon Forms
نویسندگان
چکیده
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency. This work is partly funded by the HPAC project of the French Agence Nationale de la Recherche (ANR 11 BS02 013). Université de Grenoble. Laboratoire LJK, umr CNRS, INRIA, UJF, UPMF, GINP. 51, av. des Mathématiques, F38041 Grenoble, France. INRIA. Laboratoire LIG, umr CNRS, INRIA, UJF, UPMF, GINP. 51, av. J. Kuntzmann, F38330 Montbonnot St-Martin, France. Université de Grenoble. Laboratoire de l’Informatique du Parallélisme umr CNRS, INRIA, UCBL, ÉNS de Lyon. 46 Allée d’Italie, F69364 LYON Cedex 07, France. [email protected], [email protected], [email protected], [email protected].
منابع مشابه
Finite Horizon Economic Lot and Delivery Scheduling Problem: Flexible Flow Lines with Unrelated Parallel Machines and Sequence Dependent Setups
This paper considers the economic lot and delivery scheduling problem in a two-echelon supply chains, where a single supplier produces multiple components on a flexible flow line (FFL) and delivers them directly to an assembly facility (AF). The objective is to determine a cyclic schedule that minimizes the sum of transportation, setup and inventory holding costs per unit time without shortage....
متن کاملParallel computation framework for optimizing trailer routes in bulk transportation
We consider a rich tanker trailer routing problem with stochastic transit times for chemicals and liquid bulk orders. A typical route of the tanker trailer comprises of sourcing a cleaned and prepped trailer from a pre-wash location, pickup and delivery of chemical orders, cleaning the tanker trailer at a post-wash location after order delivery and prepping for the next order. Unlike traditiona...
متن کاملWeave ElGamal Encryption for Secure Outsourcing Algebraic Computations over ℤp
This paper addresses the secure outsourcing problem for large-scale matrix computation to a public cloud. We propose a novel public-key weave ElGamal encryption (WEE) scheme for encrypting a matrix over the field Zp. The scheme has the echelon transformation property. We can apply a series of elementary row/column operations to transform an encrypted matrix under our WEE scheme into the row/col...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملThree Dimensional Dct/idct Architecture
In this paper, the design and development of a new fully parallel architecture for the computation of the threedimensional discrete cosine transform (3D DCT) is presented. It can be used for the computation of either the forward or the inverse 3D DCT and is suitable for real-time processing of 2D or multi-view video codecs. The computation of the 3D DCT is carried out using the row-column-frame...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014